Why Git/GitHub?
Git Basics
Workflow
Next Steps
Resources
We already have tools for tracking changes and collaboration for text-based documents. Why do we need version control solutions like GitHub?
Changes in one line of code can drastically impact other areas in non-obvious ways.
We want to have a historical record of changes to provide insight for others and our future selves.
We generally prototype portions of our code without impacting what is already working in production.
We need more than one person to be able to work on a document at once.
Open source software for managing software projects
Very low level - Only available through CLI (i.e. Terminal)
Provide core functions (tracking changes, commits, pull/push, etc)
Web-based platform built on top of Git with an easier to use interface, collaboration and issue tracking system, etc.
Widely used across the industry
A single source of truth
There’s a few key concepts that are important for being successful with Git:
A centralized folder/directory where all of your code lives for a given project and is tracked by Git.
A snapshot of all the files in a repository
Pull changes from the server (remote) that are missing locally.
Push changes (the commit) that exist locally but not on the server (remote)
Repos are a folder/directory that is being tracked by Git.
I would suggest creating the repo first in GitHub and then cloning it locally.
While you can go from existing project to GitHub, it’s more work so I generally don’t suggest it.
I recommend creating private repos to start with as public are generally accessible to…well, the public.
Click on the Code button to copy the repo’s URL
For RStudio users, I recommend creating a new project using Version Control. We only have to do this once per repo (per machine).
After we make changes to files located in a repo that Git is tracking, we can commit those changes.
First line <50 characters, followed by a space, then additional details if necessary
If you’re writing a lot then the commit is too big/infrequent, too verbose, or some of the details should live elsewhere (comments, issues, etc).
Sometimes, there are files within a directory that we don’t want git to track:
Large files
Sensitive files
Package/Library-related files
.gitignoreWhen we create a new project in RStudio using version control (in our case GitHub), it automatically creates a .gitignore file with a few key files that don’t need to be tracked by git.
You add files to git ignore manually or most IDEs will have a UI interface to add them to .gitignore
.gitignore exampleThese files won’t be tracked by Git nor will they show up in GitHub
People usually say “Push/Pull”, but I’m being very specific in ordering it as Pull/Push. We always want to pull before we push.
As a reminder:
Pull changes from the server (remote) that are missing locally.
Push changes (the commit) that exist locally but not on the server (remote)
What happens in the following scenario:
If Farshad’s fix for Bug B changes the code enough that it now conflicts with code of my new feature, if I try to push it to the remote, I will get denied because of conflicts.
This means I need to go back, pull the changes from the remote, make sure my code works properly still for Feature A, and then push it to the remote.
This doesn’t necessarily happen often but it’s always a best practice to avoid this by just pulling before pushing.
Why Git/GitHub
Basics of Git
The complete workflow from start to finish
Create/Clone Repositories
Commits
Pull/Push
GitHub issues are a great way of keeping track of new features, bugs, and other tasks associated to our projects
Space to add more context to the code that is index-able for others and our future selves
Assign to individuals, tag, etc
Link issues to commit
In the process of creating this presentation, I created an issue for the Posit/Quarto team:
GitHub is all about branches. We have haven’t discussed it yet but so far we’ve been working only in the repo’s main branch. When working on a feature or bug, we can create a branch and then merge it back to main when we’re done
Projects are based on Issues, which makes it easier to manage our work